Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rn/na-notebook #578

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from
Open

Rn/na-notebook #578

wants to merge 8 commits into from

Conversation

rnayebi21
Copy link
Contributor

@rnayebi21 rnayebi21 commented Dec 7, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main" (unless this is a release
    PR).
  • Request a review from one of the current main reviewers:
    brookslogan, nmdefries.
  • Makes sure to bump the version number in DESCRIPTION. Always increment
    the patch version number (the third number), unless you are making a
    release PR from dev to main, in which case increment the minor version
    number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    1.7.2, then write your changes under the 1.8 heading).
  • See DEVELOPMENT.md for more information on the development
    process.

Change explanations for reviewer

Added the reviewed na-notebook. Key differences from last time: added an example and explained complete(), as well as added the LOCF in version example.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

  • Resolves #{issue number}

@rnayebi21 rnayebi21 requested a review from brookslogan December 7, 2024 00:54
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments so far. Will revisit.

impute_locf <- function(data) {
data %>%
group_by(geo_value) %>%
mutate(across(where(is.numeric), ~ zoo::na.locf(.x, na.rm = FALSE), .names = "{.col}_locf")) %>%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: use fill from tidyr to do this instead, to focus on a smaller set of packages. I guess we should probably include an arrange to make sure things are arranged by time_value within each geo_value as well.


### NAs from merging

First let's start with discussing the most common type of missing values that appeared in the context of my auxiliary signal project. When working with multiple signals each signal will likely begin recording at different times. In other words each signal's first data point $t_0$ will differ on the absolute time scale. As a result, when calling `epix_merge()` to combine multiple signals, the signals that started recording at a later point in time will have missing values for the time periods where the other signals were already recording. Here's a quick example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: let's present this as more instructional. Here, that means just tweaking language, getting rid of "that appeared in the context of my auxiliary signal project", and then softening/rewording "the most common" since it may not be the most common in general.

(But in other parts we actually need to make it the content more instructional/relevant (use real data not buggy/artificial).)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: epix_merge() can introduce NAs both from differing min time_values and differing min versions, but

  • the description here is a little ambiguous about which it's referring to
  • the table is showing something more like an epi_df

Possible fix: instead of mentioning epix_merge() at this point, we could say this is an issue with some types of joins, and then transition into the epix_merge() example with some more discussion.


```{r latest_fn}
latest <- function(x) {
epix_as_of(x, max_version = max(x$versions_end))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
epix_as_of(x, max_version = max(x$versions_end))
epix_as_of(x, x$versions_end)

we renamed max_version, and versions_end is a scalar

geo_type = "state",
time_values = epirange(20200220, today),
geo_values = states,
issues = epirange(20201130, today)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest:
move to time_values = "*", issues = epirange(12340101, today) or some other absurdly early start issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then maybe do something else to show people the range of time values and issues in the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants